高密度物体(例如金属植入物和牙科填充物)的存在可以在计算机断层扫描(CT)图像中引入严重的条纹样伪像,从而极大地限制了随后的诊断。尽管已经提出了用于减少金属伪像的各种基于神经网络的方法(MAR),但由于对正式域中的全球环境的利用有限,图像域引入的次生伪像,它们的性能通常不佳,并且需要精确的次要伪像。金属面具。为了解决这些问题,本文探讨了在辛图和图像域中在MAR中的快速傅立叶卷积,并提出了MAR的傅立叶双域网络,称为FD-MAR。具体而言,我们首先提出了一个傅立叶曲调恢复网络,该网络可以利用辛克图范围内的接受环境来填充来自未腐败区域的金属腐败区域,因此对金属痕迹是可靠的。其次,我们在图像域中提出了一个傅立叶细化网络,该网络可以通过探索整个图像范围的上下文信息以局部到全球的方式来完善重建的图像。结果,拟议的FD-MAR可以探索MAR的正式和图像范围的接收场。通过通过复合损失函数优化FD-MAR,广泛的实验结果证明了拟议的FD-MAR在定量指标和视觉比较方面的优越性优于最先进的MAR方法。值得注意的是,FD-MAR不需要精确的金属口罩,这在临床常规中非常重要。
translated by 谷歌翻译
Real-time semantic segmentation has played an important role in intelligent vehicle scenarios. Recently, numerous networks have incorporated information from multi-size receptive fields to facilitate feature extraction in real-time semantic segmentation tasks. However, these methods preferentially adopt massive receptive fields to elicit more contextual information, which may result in inefficient feature extraction. We believe that the elaborated receptive fields are crucial, considering the demand for efficient feature extraction in real-time tasks. Therefore, we propose an effective and efficient architecture termed Dilation-wise Residual segmentation (DWRSeg), which possesses different sets of receptive field sizes within different stages. The architecture involves (i) a Dilation-wise Residual (DWR) module for extracting features based on different scales of receptive fields in the high level of the network; (ii) a Simple Inverted Residual (SIR) module that uses an inverted bottleneck structure to extract features from the low stage; and (iii) a simple fully convolutional network (FCN)-like decoder for aggregating multiscale feature maps to generate the prediction. Extensive experiments on the Cityscapes and CamVid datasets demonstrate the effectiveness of our method by achieving a state-of-the-art trade-off between accuracy and inference speed, in addition to being lighter weight. Without using pretraining or resorting to any training trick, we achieve 72.7% mIoU on the Cityscapes test set at a speed of 319.5 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods. The code and trained models are publicly available.
translated by 谷歌翻译
在本文中,我们考虑了颜色加式双相机系统,并提出了一个端到端的卷积神经网络,以有效且具有成本效益的方式使图像对齐和融合图像。我们的方法将跨域和跨尺度图像作为输入,因此综合了HR着色结果,以促进单相机成像系统中时空分辨率和色彩深度之间的权衡。与以前的着色方法相反,我们的功能可以适应具有独特时空分辨率的色彩和单色相机,从而使实际应用中的灵活性和鲁棒性。我们方法的关键要素是一个跨相机比对模块,该模块生成跨域图像对齐的多尺度对应关系。通过在各种数据集和多个设置上进行广泛的实验,我们验证了方法的灵活性和有效性。值得注意的是,我们的方法始终取得了实质性改进,即在最新方法上,大约10dB PSNR增益。代码为:https://github.com/indigopurple/ccdc
translated by 谷歌翻译
随着城市化过程的进步,城市运输系统对于城市的发展和公民的生活质量至关重要。其中,它是通过分析交通拥堵因素来判断交通拥堵的最重要任务之一。最近,引入了各种传统和基于机器的基于机器的模型,以预测交通拥堵。但是,这些模型要么在大规模的拥塞因素上汇集不佳,要么无法对大规模空间中每个精确的位置做出准确的预测。为了减轻这些问题,本文提出了一个基于卷积神经网络的新型端到端框架。通过学习表示,该框架提出了一个新颖的多模式融合模块和一个新颖的表示模块,以在大规模地图上在任意查询位置上实现交通拥堵预测,并结合各种全局参考信息。所提出的框架可在现实世界大规模数据集上取得重大结果和有效的推断。
translated by 谷歌翻译
自动行动质量评估(AQA)由于其广泛的应用而吸引了更多的兴趣。但是,现有的AQA方法通常采用多分支模型来生成多个分数,这对于处理可变数量的法官而言并不灵活。在本文中,我们提出了一种新型的不确定性驱动的AQA(UD-AQA)模型,以仅使用一个单个分支生成多个预测。具体而言,我们设计了基于CVAE(条件变异自动编码器)模块来编码不确定性,其中可以通过多次从学习的潜在空间进行采样来产生多个分数。此外,我们输出了不确定性的估计,并利用预测的不确定性重新体重AQA回归损失,这可以减少不确定样本训练的贡献。我们进一步设计了一种不确定性引导的训练策略,以动态调整样本的学习顺序,从低不确定性到高不确定性。实验表明,我们提出的方法在奥林匹克事件MTL-AQA和手术技能jigsaws数据集上实现了新的最新结果。
translated by 谷歌翻译
在视频deNoising中,相邻的框架通常提供非常有用的信息,但是需要准确的对齐方式,然后才能刺激此类信息。在这项工作中,我们提出了一个多对准网络,该网络生成多个流动建议,然后是基于注意的平均。它用于模仿非本地机制,通过平均多个观测来抑制噪声。我们的方法可以应用于基于流量估计的各种最新模型。大规模视频数据集上的实验表明,我们的方法通过0.2DB提高了Denoisis基线模型,并通过模型蒸馏进一步将参数降低了47%。代码可在https://github.com/indigopurple/manet上找到。
translated by 谷歌翻译
The ability of snapshot compressive imaging (SCI) systems to efficiently capture high-dimensional (HD) data has led to an inverse problem, which consists of recovering the HD signal from the compressed and noisy measurement. While reconstruction algorithms grow fast to solve it with the recent advances of deep learning, the fundamental issue of accurate and stable recovery remains. To this end, we propose deep equilibrium models (DEQ) for video SCI, fusing data-driven regularization and stable convergence in a theoretically sound manner. Each equilibrium model implicitly learns a nonexpansive operator and analytically computes the fixed point, thus enabling unlimited iterative steps and infinite network depth with only a constant memory requirement in training and testing. Specifically, we demonstrate how DEQ can be applied to two existing models for video SCI reconstruction: recurrent neural networks (RNN) and Plug-and-Play (PnP) algorithms. On a variety of datasets and real data, both quantitative and qualitative evaluations of our results demonstrate the effectiveness and stability of our proposed method. The code and models are available at: https://github.com/IndigoPurple/DEQSCI .
translated by 谷歌翻译
现代物体检测网络追求一般物体检测数据集的更高精度,同时计算负担也随着精度的提高而越来越多。然而,推理时间和精度对于需要是实时的对象检测系统至关重要。没有额外的计算成本,有必要研究精度改进。在这项工作中,提出了两种模块以提高零成本的检测精度,这是一般对象检测网络的FPN和检测头改进。我们采用规模注意机制,以有效地保险熔断多级功能映射,参数较少,称为SA-FPN模块。考虑到分类头和回归头的相关性,我们使用顺序头取代广泛使用的并联头部,称为SEQ-Head模块。为了评估有效性,我们将这两个模块应用于一些现代最先进的对象检测网络,包括基于锚和无锚。 Coco DataSet上的实验结果表明,具有两个模块的网络可以将原始网络超越1.1 AP和0.8 AP,分别为锚的锚和无锚网络的零成本。代码将在https://git.io/jtfgl提供。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译